Query Expansion with Naive Bayes for Searching Distributed Collections

نویسندگان

  • Hui Yang
  • Minjie Zhang
چکیده

The proliferation of online information resources increases the importance of effective and efficient distributed searching. However, the problem of word mismatch seriously hurts the effectiveness of distributed information retrieval. Automatic query expansion has been suggested as a technique for dealing with the fundamental issue of word mismatch. In this paper, we propose a method query expansion with Naive Bayes to address the problem, discuss its implementation in IISS system, and present experimental results demonstrating its effectiveness. Such technique not only enhances the discriminatory power of typical queries for choosing the right collections but also hence significantly improves retrieval results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Eeective Retrieval with Distributed Collections

This paper evaluates the retrieval eeective-ness of distributed information retrieval systems in realistic environments. We nd that when a large number of collections are available, the retrieval eeectiveness is signiicantly worse than that of centralized systems, mainly because typical queries are not adequate for the purpose of choosing the right collections. We propose two techniques to addr...

متن کامل

Diagnosis of Pulmonary Tuberculosis Using Artificial Intelligence (Naive Bayes Algorithm)

Background and Aim: Despite the implementation of effective preventive and therapeutic programs, no significant success has been achieved in the reduction of tuberculosis. One of the reasons is the delay in diagnosis. Therefore, the creation of a diagnostic aid system can help to diagnose early Tuberculosis. The purpose of this research was to evaluate the role of the Naive Bayes algorithm as a...

متن کامل

Query Expansion for the Language Modelling Framework Using the Naïve Bayes Assumption

Language modelling is new form of information retrieval that is rapidly becoming the preferred choice over probabilistic and vector space models, due to the intuitiveness of the model formulation and its effectiveness. The language model assumes that all terms are independent, therefore the majority of the documents returned to the ser will be those that contain the query terms. By making this ...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

CAP7: Searching and Browsing in Distributed Document Collections

This paper describes CAP7, a system for searching and browsing in distributed document (metadata) collections. The system architecture is similar to Harvest, comprising gatherer components and a retrieval engine; but instead of the limited SOIF data format, we use RDF and XML. The gatherer creates RDF metadata descriptions of collected resources. Before delivering the data to the retrieval engi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002